Empty Categories in Hindi Dependency Treebank: Analysis and Recovery

نویسندگان

  • Chaitanya G. S. K.
  • Samar Husain
  • Prashanth Mannem
چکیده

In this paper, we first analyze and classify the empty categories in a Hindi dependency treebank and then identify various discovery procedures to automatically detect the existence of these categories in a sentence. For this we make use of lexical knowledge along with the parsed output from a constraint based parser. Through this work we show that it is possible to successfully discover certain types of empty categories while some other types are more difficult to identify. This work leads to the state-of-the-art system for automatic insertion of empty categories in the Hindi sentence.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Statistical Approach to Prediction of Empty Categories in Hindi Dependency Treebank

In this paper we use statistical dependency parsing techniques to detect NULL or Empty categories in the Hindi sentences. We have currently worked on Hindi dependency treebank which is released as part of COLINGMTPIL 2012 Workshop. Earlier Rule based approaches are employed to detect Empty heads for Hindi language but statistical learning for automatic prediction is not explored. In this approa...

متن کامل

Using CCG categories to improve Hindi dependency parsing

We show that informative lexical categories from a strongly lexicalised formalism such as Combinatory Categorial Grammar (CCG) can improve dependency parsing of Hindi, a free word order language. We first describe a novel way to obtain a CCG lexicon and treebank from an existing dependency treebank, using a CCG parser. We use the output of a supertagger trained on the CCGbank as a feature for a...

متن کامل

Empty Categories in a Hindi Treebank

We are in the process of creating a multi-representational and multi-layered treebank for Hindi/Urdu (Palmer et al., 2009), which has three main layers: dependency structure, predicate-argument structure (PropBank), and phrase structure. This paper discusses an important issue in treebank design which is often neglected: the use of empty categories (ECs). All three levels of representation make...

متن کامل

Empty Argument Insertion in the Hindi PropBank

This paper examines both linguistic behavior and practical implications of empty argument insertion in the Hindi PropBank. The Hindi PropBank is annotated on the Hindi Dependency Treebank, which contains some empty categories but rarely the empty arguments of verbs. In this paper, we analyze four kinds of empty arguments, *PRO*, *REL*, *GAP*, *pro*, and suggest effective ways of annotating thes...

متن کامل

Keeping it Simple: Generating Phrase Structure Trees from a Hindi Dependency Treebank

Converting a treebank from one representation type to another poses several challenges [4] [3]. These challenges are contingent on (amongst other things) the information encoded in source representation and the information required in target representation. In this paper, we propose a conversion algorithm that converts the Hindi-Urdu Dependency Treebank (HUTB) to a Phrase Structure (PS) represe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011